79 research outputs found

    Comparative recommender system evaluation: Benchmarking recommendation frameworks

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '14 Proceedings of the 8th ACM Conference on Recommender systems, http://dx.doi.org/10.1145/2645710.2645746Recommender systems research is often based on comparisons of predictive accuracy: the better the evaluation scores, the better the recommender. However, it is difficult to compare results from different recommender systems due to the many options in design and implementation of an evaluation strategy. Additionally, algorithmic implementations can diverge from the standard formulation due to manual tuning and modifications that work better in some situations. In this work we compare common recommendation algorithms as implemented in three popular recommendation frameworks. To provide a fair comparison, we have complete control of the evaluation dimensions being benchmarked: dataset, data splitting, evaluation strategies, and metrics. We also include results using the internal evaluation mechanisms of these frameworks. Our analysis points to large differences in recommendation accuracy across frameworks and strategies, i.e. the same baselines may perform orders of magnitude better or worse across frameworks. Our results show the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.This work was partly carried out during the tenure of an ERCIM “Alain Bensoussan” Fellowship Programme. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreements n◦246016 and n◦610594, and the Spanish Ministry of Science and Innovation (TIN2013-47090-C3-2

    Improving accountability in recommender systems research through reproducibility

    Full text link
    Reproducibility is a key requirement for scientific progress. It allows the reproduction of the works of others, and, as a consequence, to fully trust the reported claims and results. In this work, we argue that, by facilitating reproducibility of recommender systems experimentation, we indirectly address the issues of accountability and transparency in recommender systems research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. These issues have become increasingly prevalent in recent literature. Reasons for this include societal movements around intelligent systems and artificial intelligence striving toward fair and objective use of human behavioral data (as in Machine Learning, Information Retrieval, or Human–Computer Interaction). Society has grown to expect explanations and transparency standards regarding the underlying algorithms making automated decisions for and around us. This work surveys existing definitions of these concepts and proposes a coherent terminology for recommender systems research, with the goal to connect reproducibility to accountability. We achieve this by introducing several guidelines and steps that lead to reproducible and, hence, accountable experimental workflows and research. We additionally analyze several instantiations of recommender system implementations available in the literature and discuss the extent to which they fit in the introduced framework. With this work, we aim to shed light on this important problem and facilitate progress in the field by increasing the accountability of researchThis work has been funded by the Ministerio de Ciencia, Innovación y Universidades (reference: PID2019-108965GB-I00

    Coherence and Inconsistencies in Rating Behavior - Estimating the Magic Barrier of Recommender Systems

    Full text link
    Recommender Systems have to deal with a wide variety of users and user types that express their preferences in di erent ways. This di erence in user behavior can have a profound impact on the performance of the recommender system. Users receive better (or worse) recommendations depending on the quantity and the quality of the information the system knows about them. Speci cally, the inconsistencies in users' preferences impose a lower bound on the error the system may achieve when predicting ratings for one particular user { this is referred to as the magic barrier. In this work, we present a mathematical characterization of the magic barrier based on the assumption that user ratings are a icted with inconsistencies { noise. Furthermore, we propose a measure of the consistency of user ratings (rating coherence) that predicts the performance of recommendation methods. More speci cally, we show that user coherence is correlated with the magic barrier; we exploit this correlation to discriminate between easy users (those with a lower magic barrier) and di cult ones (those with a higher magic barrier). We report experiments where the recommendation error for the more coherent users is lower than that of the less coherent ones. We further validate these results by using two public datasets, where the necessary data to identify the magic barrier is not available, in which we obtain similar performance improvementsThis research was in part supported by the Spanish Ministry of Economy, Industry and Competitiveness (TIN2016-80630-P

    Replicable Evaluation of Recommender Systems

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '15 Proceedings of the 9th ACM Conference on Recommender Systems, http://dx.doi.org/10.1145/2792838.2792841.Recommender systems research is by and large based on comparisons of recommendation algorithms’ predictive accuracies: the better the evaluation metrics (higher accuracy scores or lower predictive errors), the better the recommendation algorithm. Comparing the evaluation results of two recommendation approaches is however a difficult process as there are very many factors to be considered in the implementation of an algorithm, its evaluation, and how datasets are processed and prepared. This tutorial shows how to present evaluation results in a clear and concise manner, while ensuring that the results are comparable, replicable and unbiased. These insights are not limited to recommender systems research alone, but are also valid for experiments with other types of personalized interactions and contextual information access.Supported in part by the Ministerio de Educación y Ciencia (TIN2013-47090-C3-2)

    Building user profiles based on sequences for content and collaborative filtering

    Full text link
    Modeling user profiles is a necessary step for most information filtering systems – such as recommender systems – to provide personalized recommendations. However, most of them work with users or items as vectors, by applying di erent types of mathematical operations between them and neglecting sequential or content-based information. Hence, in this paper we study how to propose an adaptive mechanism to obtain user sequences using di erent sources of information, allowing the generation of hybrid recommendations as a seamless, transparent technique from the system viewpoint. As a proof of concept, we develop the Longest Common Subsequence (LCS) algorithm as a similarity metric to compare the user sequences, where, in the process of adapting this algorithm to recommendation, we include di erent parameters to control the e - ciency by reducing the information used in the algorithm (preference filter), to decide when a neighbor is considered useful enough to be included in the process (confidence filter), to identify whether two interactions are equivalent ( -matching threshold), and to normalize the length of the LCS in a bounded interval (normalization functions). These parameters can be extended to work with any type of sequential algorithm. We evaluate our approach with several state-of-the-art recommendation algorithms using di erent evaluation metrics measuring the accuracy, diversity, and novelty of the recommendations, and analyze the impact of the proposed parameters. We have found that our approach o ers a competitive performance, outperforming content, collaborative, and hybrid baselines, and producing positive results when either content- or rating-based information is exploitedThis article has been co-funded by the European Social Fund (ESF) within the 2017 call for predoctoral contracts and the Spanish Ministry of Economy, Industry and Competitiveness (project reference: TIN2016-80630-P

    Applying reranking strategies to route recommendation using sequence-aware evaluation

    Full text link
    Venue recommendation approaches have become particularly useful nowadays due to the increasing number of users registered in location-based social networks (LBSNs), applications where it is possible to share the venues someone has visited and establish connections with other users in the system. Besides, the venue recommendation problem has certain characteristics that differ from traditional recommendation, and it can also benefit from other contextual aspects to not only recommend independent venues, but complete routes or venue sequences of related locations. Hence, in this paper, we investigate the problem of route recommendation under the perspective of generating a sequence of meaningful locations for the users, by analyzing both their personal interests and the intrinsic relationships between the venues. We divide this problem into three stages, proposing general solutions to each case: First, we state a general methodology to derive user routes from LBSNs datasets that can be applied in as many scenarios as possible; second, we define a reranking framework that generate sequences of items from recommendation lists using different techniques; and third, we propose an evaluation metric that captures both accuracy and sequentiality at the same time. We report our experiments on several LBSNs datasets and by means of different recommendation quality metrics and algorithms. As a result, we have found that classical recommender systems are comparable to specifically tailored algorithms for this task, although exploiting the temporal dimension, in general, helps on improving the performance of these techniques; additionally, the proposed reranking strategies show promising results in terms of finding a trade-off between relevance, sequentiality, and distance, essential dimensions in both venue and route recommendation tasksThis work has been funded by the Ministerio de Ciencia, Innovación y Universidades (reference: TIN2016-80630-P) and by the European Social Fund (ESF), within the 2017 call for predoctoral contract

    Recommender system performance evaluation and prediction: information retrieval perspective

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 201

    Self-adjusting hybrid recommenders based on social network analysis

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, http://dx.doi.org/10.1145/2009916.2010092Ensemble recommender systems successfully enhance recom-mendation accuracy by exploiting different sources of user prefe-rences, such as ratings and social contacts. In linear ensembles, the optimal weight of each recommender strategy is commonly tuned empirically, with limited guarantee that such weights are optimal afterwards. We propose a self-adjusting hybrid recommendation approach that alleviates the social cold start situation by weighting the recommender combination dynamically at recommendation time, based on social network analysis algorithms. We show empirical results where our approach outperforms the best static combination for different hybrid recommenders.This work was supported by the Spanish Ministry of Science and Innovation (TIN2008-06566-C04-02), University Autónoma de Madrid and the Community of Madrid (CCG10-UAM/TIC-5877)
    corecore